8 research outputs found

    Modelling spatio-temporal human behaviour with mobile phone data : a data analytical approach

    Get PDF

    From one-class to two-class classification by incorporating expert knowledge : novelty detection in human behaviour

    Get PDF
    One-class classification is the standard procedure for novelty detection. Novelty detection aims to identify observations that deviate from a determined normal behaviour. Only instances of one class are known, whereas so called novelties are unlabelled. Traditional novelty detection applies methods from the field of outlier detection. These standard one-class classification approaches have limited performance in many real business cases. The traditional techniques are mainly developed for industrial problems such as machine condition monitoring. When applying these to human behaviour, the performance drops significantly. This paper proposes a method that improves existing approaches by creating semi-synthetic novelties in order to have labelled data for the two classes. Expert knowledge is incorporated in the initial phase of this data generation process. The method was deployed on a real-life test case where the goal was to detect fraudulent subscriptions to a telecom family plan. This research demonstrates that the two-class expert model outperforms a one-class model on the semi-synthetic dataset. In a next step the model was validated on a real dataset. A fraud detection team of the company manually checked the top predicted novelties. The results show that incorporating expert knowledge to transform a one-class problem into a two-class problem is a valuable method

    Home location prediction with telecom data : benchmarking heuristics with a predictive modelling approach

    No full text
    Correctly identifying the home location is crucial for human mobility analysis with telecom data, more specifically call detail record (CDR) data. To that end, multiple heuristics have been developed in literature. Nevertheless, due to the lack of ground truth home location data, no study has thoroughly validated these widely used methods so far. We present a detailed performance analysis of existing home detection heuristics, using a unique dataset that enables this important validation on the lowest level, being the level of the cell tower. Our research indicates that simple heuristics surprisingly outperform their more complex counterparts. The benchmark study revealed that the best heuristic is able to identify the home location with an average error of approximately 4.5 km and selects the correct home tower in 60.69% of the cases. Based on the insights provided by our study, we propose a new heuristic that increases the accuracy to 61% and lowers the average distance error to 4.365 km. Secondly, if the home location is known for possibly only a fraction of the instances, we propose a labelled predictive modelling approach. Adding social network based variables in this predictive model further enhances the predictive performance. Our best model reduces the average distance error to 2.848 km and selects the correct home location in 72.08% of the cases. Furthermore, this result provides an indication of the upper bound for home detection with CDR data. Finally, models that only make use of social network based data are developed as well. Results show that even without using data of the focal individual, these models are able to select the correct home tower in 37.65% of the cases and achieve an average distance error of 8.1 km

    From one-class to two-class classification by incorporating expert knowledge

    No full text
    In certain business cases the aim is to identify observations that deviate from an identified normal behaviour. It is often the case that only instances of the normal class are known, whereas so called novelties are undiscovered. Novelty detection or anomaly detection approaches usually apply methods from the field of outlier detection. However, anomalies are not always outliers and outliers are not always anomalies. The standard one-class classification approaches therefore underperform in many real business cases. Drawing upon literature about incorporating expert knowledge,we come up with a new method that significantly improves the predictive performance of a one-class model. Combining the available data and expert knowledge about potential anomalies enables us to create synthetic novelties. The latter are incorporated into a standard two-class predictive model. Based on a telco dataset, we prove that our synthetic two-class model clearly outperforms a standard one-class model on the synthetic dataset. In a next step the model was applied to real data. Top identified novelties were manually checked by experts. The results indicate that incorporating expert knowledge to transform a one-class problem into a two-class problem is a valuable method

    Bluetooth tracking of humans in an indoor environment: An application to shopping mall visits

    Get PDF
    Intelligence about the spatio-temporal behaviour of individuals is valuable in many settings. Generating tracking data is a necessity for this analysis and requires an appropriate methodology. In this study, the applicability of Bluetooth tracking in an indoor setting is investigated. A wide variety of applications can benefit from indoor Bluetooth tracking. This paper examines the value of the method in a marketing application. A Belgian shopping mall served as a real-life test setting for the methodology. A total of 56 Bluetooth scanners registered 18.943 unique MAC addresses during a 19-day period. The results indicate that Bluetooth tracking is a sound approach for capturing tracking data, which can be used to map and analyse the spatio-temporal behaviour of individuals. The methodology also provides a more efficient and more accurate way of obtaining a variety of relevant metrics in the field of consumer behaviour research. Bluetooth tracking can be implemented as a new and cost effective practice for marketing research, that provides fast and accurate results and insights. We conclude that Bluetooth tracking is a viable approach, but that certain technological and practical aspects need to be considered when applying Bluetooth tracking in new cases

    Profiling the aerobic window of horses in response to training by means of a modified lactate minimum speed test : flatten the curve

    No full text
    There is a great need for objective external training load prescription and performance capacity evaluation in equestrian disciplines. Therefore, reliable standardised exercise tests (SETs) are needed. Classic SETs require maximum intensities with associated risks to deduce training loads from pre-described cut-off values. The lactate minimum speed (LMS) test could be a valuable alternative. Our aim was to compare new performance parameters of a modified LMS-test with those of an incremental SET, to assess the effect of training on LMS-test parameters and curve-shape, and to identify the optimal mathematical approach for LMS-curve parameters. Six untrained standardbred mares (3–4 years) performed a SET and LMS-test at the start and end of the 8-week harness training. The SET-protocol contains 5 increments (4 km/h; 3 min/step). The LMS-test started with a 3-min trot at 36–40 km/h [until blood lactate (BL) > 5 mmol/L] followed by 8 incremental steps (2 km/h; 3 min/step). The maximum lactate steady state estimation (MLSS) entailed >10 km run at the LMS and 110% LMS. The GPS, heartrate (Polar®), and blood lactate (BL) were monitored and plotted. Curve-parameters (R core team, 3.6.0) were (SET) VLa1.5/2/4 and (LMS-test) area under the curve (AUC>/<LMS), LMS and Aerobic Window (AW) via angular vs. threshold method. Statistics for comparison: a paired t-test was applied, except for LMS: paired Wilcoxon test; (p < 0.05). The Pearson correlation (r > 0.80), Bland-Altman method, and ordinary least products (OLP) regression analyses were determined for test-correlation and concordance. Training induced a significant increase in VLa1.5/2/4. The width of the AW increased significantly while the AUC</>LMS and LMS decreased post-training (flattening U-curve). The LMS BL steady-state is reached earlier and maintained longer after training. BLmax was significantly lower for LMS vs. SET. The 40° angular method is the optimal approach. The correlation between LMS and VMLSS was significantly better compared to the SET. The VLa4 is unreliable for equine aerobic capacity assessment. The LMS-test allows more reliable individual performance capacity assessment at lower speed and BL compared to SETs. The LMS-test protocol can be further adapted, especially post-training; however, inducing modest hyperlactatemia prior to the incremental LMS-stages and omitting inclusion of a per-test recovery contributes to its robustness. This LMS-test is a promising tool for the development of tailored training programmes based on the AW, respecting animal welfare
    corecore